Goto

Collaborating Authors

 time sery clustering


Learning Representations for Time Series Clustering

Neural Information Processing Systems

Time series clustering is an essential unsupervised technique in cases when category information is not available. It has been widely applied to genome data, anomaly detection, and in general, in any domain where pattern detection is important. Although feature-based time series clustering methods are robust to noise and outliers, and can reduce the dimensionality of the data, they typically rely on domain knowledge to manually construct high-quality features. Sequence to sequence (seq2seq) models can learn representations from sequence data in an unsupervised manner by designing appropriate learning objectives, such as reconstruction and context prediction. When applying seq2seq to time series clustering, obtaining a representation that effectively represents the temporal dynamics of the sequence, multi-scale features, and good clustering properties remains a challenge.


Coresets for Time Series Clustering

Neural Information Processing Systems

We study the problem of constructing coresets for clustering problems with time series data. This problem has gained importance across many fields including biology, medicine, and economics due to the proliferation of sensors facilitating real-time measurement and rapid drop in storage costs. In particular, we consider the setting where the time series data on $N$ entities is generated from a Gaussian mixture model with autocorrelations over $k$ clusters in $\mathbb{R}^d$. Our main contribution is an algorithm to construct coresets for the maximum likelihood objective for this mixture model. Our algorithm is efficient, and under a mild boundedness assumption on the covariance matrices of the underlying Gaussians, the size of the coreset is independent of the number of entities $N$ and the number of observations for each entity, and depends only polynomially on $k$, $d$ and $1/\varepsilon$, where $\varepsilon$ is the error parameter. We empirically assess the performance of our coreset with synthetic data.


Mask the Redundancy: Evolving Masking Representation Learning for Multivariate Time-Series Clustering

Tan, Zexi, Luo, Xiaopeng, Liu, Yunlin, Zhang, Yiqun

arXiv.org Artificial Intelligence

Multivariate Time-Series (MTS) clustering discovers intrinsic grouping patterns of temporal data samples. Although time-series provide rich discriminative information, they also contain substantial redundancy, such as steady-state machine operation records and zero-output periods of solar power generation. Such redundancy diminishes the attention given to discriminative timestamps in representation learning, thus leading to performance bottlenecks in MTS clustering. Masking has been widely adopted to enhance the MTS representation, where temporal reconstruction tasks are designed to capture critical information from MTS. However, most existing masking strategies appear to be standalone preprocess-ing steps, isolated from the learning process, which hinders dynamic adaptation to the importance of clustering-critical timestamps. Accordingly, this paper proposes the Evolving-masked MTS Clustering (EMTC) method, whose model architecture comprises Importance-aware V ariate-wise Masking (IVM) and Multi-Endogenous Views (MEV) generation modules. IVM adaptively guides the model in learning more discriminative representations for clustering, while the reconstruction and cluster-guided contrastive learning pathways enhance and connect the representation learning to clustering tasks. Extensive experiments on 15 benchmark datasets demonstrate the superiority of EMTC over eight SOT A methods, where the EMTC achieves an average improvement of 4.85% in F1-Score over the strongest baselines.


Learning Representations for Time Series Clustering

Neural Information Processing Systems

Time series clustering is an essential unsupervised technique in cases when category information is not available. It has been widely applied to genome data, anomaly detection, and in general, in any domain where pattern detection is important. Although feature-based time series clustering methods are robust to noise and outliers, and can reduce the dimensionality of the data, they typically rely on domain knowledge to manually construct high-quality features. Sequence to sequence (seq2seq) models can learn representations from sequence data in an unsupervised manner by designing appropriate learning objectives, such as reconstruction and context prediction. When applying seq2seq to time series clustering, obtaining a representation that effectively represents the temporal dynamics of the sequence, multi-scale features, and good clustering properties remains a challenge.


Reviews: Learning Representations for Time Series Clustering

Neural Information Processing Systems

The submission proposes a model for time-series clustering. The model is a novel combination of several existing components: a) a deep recurrent auto-encoder using dilated RNNs, b) a spectral relaxation of the K-means objective and c) a self-supervision loss to discriminate time-series corrupted by random shuffling from the original ones. The model is evaluated on a common benchmark for time-series clustering and achieves superior performance to existing methods. Overall I feel positive about the proposed method as the quantitative results look promising and using the spectral relaxation of K-means for deep clustering is novel and original. Nevertheless I do have some concerns about the submission in its current form: 1.)


Reviews: Learning Representations for Time Series Clustering

Neural Information Processing Systems

This paper proposes a deep learning approach to clustering time series by combining a deep auto encoder and the spectral relaxation of K-means. The reviewers found the approach novel and the experimental evaluation of the approach reasonable. The concerns that the reviewers raised were addressed by the authors in their response. The authors should incorporate the suggestions that the reviewers provided to improve their paper for the camera-ready version.


Coresets for Time Series Clustering

Neural Information Processing Systems

We study the problem of constructing coresets for clustering problems with time series data. This problem has gained importance across many fields including biology, medicine, and economics due to the proliferation of sensors facilitating real-time measurement and rapid drop in storage costs. In particular, we consider the setting where the time series data on N entities is generated from a Gaussian mixture model with autocorrelations over k clusters in \mathbb{R} d . Our main contribution is an algorithm to construct coresets for the maximum likelihood objective for this mixture model. Our algorithm is efficient, and under a mild boundedness assumption on the covariance matrices of the underlying Gaussians, the size of the coreset is independent of the number of entities N and the number of observations for each entity, and depends only polynomially on k, d and 1/\varepsilon, where \varepsilon is the error parameter.


Learning Representations for Time Series Clustering

Neural Information Processing Systems

Time series clustering is an essential unsupervised technique in cases when category information is not available. It has been widely applied to genome data, anomaly detection, and in general, in any domain where pattern detection is important. Although feature-based time series clustering methods are robust to noise and outliers, and can reduce the dimensionality of the data, they typically rely on domain knowledge to manually construct high-quality features. Sequence to sequence (seq2seq) models can learn representations from sequence data in an unsupervised manner by designing appropriate learning objectives, such as reconstruction and context prediction. When applying seq2seq to time series clustering, obtaining a representation that effectively represents the temporal dynamics of the sequence, multi-scale features, and good clustering properties remains a challenge.


Time Series Clustering with General State Space Models via Stochastic Variational Inference

Ishizuka, Ryoichi, Imai, Takashi, Kawamoto, Kaoru

arXiv.org Artificial Intelligence

In this paper, we propose a novel method of model-based time series clustering with mixtures of general state space models (MSSMs). Each component of MSSMs is associated with each cluster. An advantage of the proposed method is that it enables the use of time series models appropriate to the specific time series. This not only improves clustering and prediction accuracy but also enhances the interpretability of the estimated parameters. The parameters of the MSSMs are estimated using stochastic variational inference, a subtype of variational inference. The proposed method estimates the latent variables of an arbitrary state space model by using neural networks with a normalizing flow as a variational estimator. The number of clusters can be estimated using the Bayesian information criterion. In addition, to prevent MSSMs from converging to the local optimum, we propose several optimization tricks, including an additional penalty term called entropy annealing. Experiments on simulated datasets show that the proposed method is effective for clustering, parameter estimation, and estimating the number of clusters.


Time Series Clustering for Stock Market Prediction in Python- Part 1

#artificialintelligence

Originally published on Towards AI the World's Leading AI and Technology News and Media Company. If you are building an AI-related product or service, we invite you to consider becoming an AI sponsor. At Towards AI, we help scale AI and technology startups. Let us help you unleash your technology to the masses. It's free, we don't spam, and we never share your email address.